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Abstract 

We propose a new algorithm for binary quantization based on the Belief Prop- 
agation algorithm with decimation over factor graphs of Low Density Generator 
Matrix (LDGM) codes. This algorithm, which we call Bias Propagation (BiP), can 
be considered as a special case of the Survey Propagation algorithm proposed for 
binary quantization by Wainwright et al. [8]. It achieves the same near-optimal 
rate-distortion performance with a substantially simpler framework and 10-100 
times faster implementation. We thus challenge the widespread belief that binary 
quantization based on sparse linear codes cannot be solved by simple Belief Prop- 
agation algorithms. Finally, we give examples of suitably irregular LDGM codes 
that work with the BiP algorithm and show their performance. 

1 Introduction 

Binary quantization is an important problem for lossy source coding and other fields, 
such as information hiding pQ. The recent work of Wainwright et al. [8] shows that 
Low Density Generator Matrix (LDGM) codes combined with Survey Propagation (SP) 
message-passing algorithms can be used to achieve near-optimal binary quantization in 
practice. Theoretical properties of regular LDGM codes for binary quantization were 
studied by Martinian et al. [3] . These results motivated us to study practical algorithms 
for binary quantization using LDGM codes. 

In this paper, we challenge the claim that lossy source coding based on pure Be- 
lief Propagation (BP) as opposed to SP cannot give satisfactory results. We propose 
an approach based on pure Belief Propagation for quantizing random Bernoulli source 
with p = |. This algorithm, which we call Bias Propagation (BiP), achieves the same 
near-optimal rate-distortion performance in comparison with [8]. This work has been 
motivated by applications of binary quantization in steganography (information hiding), 
where the problem appears in a slightly more general setting called weighted binary quan- 
tization. 

In particular, we are quantizing a random 72,-bit source sequence s to the nearest 
codeword c s from an LDGM code C with rate R = — . In the weighted binary quanti- 
zation problem, we replace the Hamming distance with the weighted distortion measure 
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Figure 1: Factor graph representation of a linear code with generator matrix G. 



d e (s, c s ) = i 2Li ^ I s * ~ ( c s)*|, where & e [0, 1]. We call the vector g = (g 1 , . . . , g n ) the 
profile of the weighted binary quantization problem. Our goal is to minimize the average 
distortion D e = E[d e (s, c s )] for a given profile g, where E[-] is the expectation taken over 
all possible source sequences s. In case of uniform profile g = (1,...,1), we denote the 
average distortion as D. The rate-distortion function is in the form R(D) = 1 — H(D) 
for D e [0, 0.5] and otherwise, where H is the binary entropy function. Rate-distortion 
functions for other profiles can be found in pQ. 

The rest of this paper is structured as follows. In Section 2, we describe LDGM 
codes and introduce notation. In Section 3, we present a complete derivation of the BiP 
algorithm using Belief Propagation over factor graphs of LDGM codes. In Section 4, we 
briefly discuss the reason why an approach based on the BP algorithm should give us 
satisfactory results. We make a connection to the recent work of Murayama [5j. The 
paper is concluded in Section 5, where we present some experimental results. 



2 LDGM code representation 

Codes based on sparse generator matrices are duals of LDPC codes. For a given linear 
code C with generator matrix G e {0, i} nxm ; we define the factor graph of this code 
as a graph Q = (V, C, E) with n check nodes C = {1, . . . , n}, m information bits V = 
{1, . . . , m}, and n source bits. An example of a factor graph can be seen in Figure [U We 
will use variables a,b,c e C to denote the check nodes and variables i, j, k e V to denote 
the information bits. Each check node a e C has its associated source bit s a . Check node 
a is connected with information bit i , (a, i) e E, if G Qj j = 1. Finally, we define the sets 
C{i) = {aeC\(a,i)e E), V(a) = {i e V | (a, i) e E}, and V(a) = V(a) u {s a }. 

The factor graph of an LDGM code is obtained randomly using degree distributions 
from the edge perspective (p, A), p(x) = 2^=1 Pi xl ~ l an d A(x) = where pi 

and Aj denote the portion of all edges connected to check nodes and information bits 
with degree i, respectively. The degree of the check node a is defined as the number of 
connected information bits (we do not count the associated source bit). 



3 Bias Propagation algorithm 

Let C be an LDGM code with generator matrix G e {0, i} nxm ; § e {0, 1}™ a fixed source 
sequence, and g a given profile. For a given constant vector 7 = (7 1; . . . , j n ), we define 
the following conditional probability distribution over LDGM codewords 

p( w | s;7 ) = I e -2<7|Gw-s> ; (1) 



Bias Propagation Algorithm (BiP) 



(a) pseudo-code 



procedure w = BiP(G, s) 

G.B_saa = calc_src_msg(s , gamma) /* (BiP-1) */ 
G.S_ai = calc_ai(l, G.B_saa) /* (BiP-4) */ 

while not all_bits_f ixed(w) 
bias = BiP_iter(G, s) 
bias = sort (bias) 
if max( | bias | ) >t 

num = min(num_max, num_of _bits ( Ibias | >t) ) 
else 

num = num_min 
[G,s,w] = dec_most_biased_bits(G s s s w s num) 
end 
end 



procedure bias = BiP_iter(G, s) 

G.B_saa = calc_src_msg(s , gamma) /* (BiP-1) */ 
while iter<max_iter 
G.B_ia_old = G.B.ia 

G.B.ia = calc_ia(G.S_ai) /* (BiP-2) */ 
if iter>start_damp then 

G.B_ia = damping(G.B_ia, G.B_ia_old) /* (BiP-3) */ 
end 

G.S_ai = calc_ai(G.B_ia, G.B_saa)) /* (BiP-4) */ 
iter = iter+1 
end 

bias = calc_bias(G.S_ai) /* (BiP-5) */ 
end 



(b) message-passing update rules 



Source message initialization: 

^IL a = (-irtanh( 7 „) 
Bias update rule: 
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Equation for damping update in £-th iteration: 



B 



Equation for calculating the final bias B{ in £-th iteration: 
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Satisfaction update rule: 
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Constant parameters: 

numjnax. . . max. # of info bits to decimate 
num_min. . . min. # of info bits to decimate 
t. . . decimation threshold 
gamma . . . check node satisfaction strength 
max_iter . . . max. # of iterations in round 
start_damp . . . # iterations without damping 



Figure 2: Summary of the Bias Propagation algorithm. 

where (*y|Gw — s) is the dot product of vectors 7 and Gw — s (calculated in binary 
arithmetic), w e {0, l} m , and Z is a normalization constant Z = Xicee e ~ 2 ^' c ~ s - > . The 
most probable codeword c s = Gw s is the optimal solution to our original problem. The 
vector 7 should be determined from the profile g. 

The BiP is an iterative message-passing algorithm that performs bitwise MAP esti- 
mation of the optimal vector w s for a given source sequence s. This is done in rounds. In 
r-th round, we use the factor graph and source sequence s( r ) to find the most probable 
bits of w s to be fixed. The estimation of these bits is done by max_iter message-passing 
iterations over the factor graph Q( r \ In £-th iteration, the bias messages Bfl a and the 
constant source messages Bs a U. a are sent from information bits and source bits to con- 
nected check nodes. Check nodes send satisfaction messages to their connected 
information bits. Finaly, the most probable information bits are fixed and removed from 
the graph by the decimation process. This results in a new factor graph ^( r+1 ) and a 
source sequence s( r+1 ). We describe the algorithm in a condensed form in Figure [21 Later 
in this section, we derive the update rules. 

Given the original factor graph Q( l > = Q and the initial source sequence = s, 
we start the message-passing process by setting S^Xi = Bisa->a, Va e C. The source 



messages are calculated using equation (BiP-1) and stay constant within one round. 
The parameter 7 a expresses the strength of the check node a. After max_iter message- 
passing iterations using equations (BiP-2)-(BiP-4), we calculate the final bias Bi for 
each information bit i using equation (BiP-5). The bias Bi expresses the difference 
of marginal probabilities P(w, = 0|s;"y) — P(Wj = l|s; -y). We fix the most biased 
information bit to w» = if Bi > and Wj = 1 otherwise. Based on the maximal \Bi\, 
we fix num_min or num_max information bits in each round. The decimation step removes 
all fixed information bits from the factor graph along with all adjacent edges and checks 
with degree zero. This results in a new factor graph Q^ 2 \ The new source sequence 
s^ 2 ) is obtained from s^ 1 ) by XOR-ing it with all bits that were connected to the fixed 
information bits (see Figure [2]). The satisfaction messages S^}^ in the second round are 

initialized with the value of messages from the last iteration in the previous round. 
The decimation step only removes some edges and vertices from the factor graph, but 
preserves the values of messages associated with each edge. We repeat the described 
procedure (one round) until we fix all information bits from the original factor graph and 
output the sequence of information bits as w s . 



3.1 Derivation of the BiP algorithm 

We now carry out the derivations for an arbitrary profile g assuming that we know the 
vector 7 = (7x,...,7 n ), 7, ^ \fi = l,...,n, and reformulate the weighted binary 
quantization problem as bitwise Maximum A Posteriori (MAP) estimation. 

First, we factorize the probability distribution (TjQ). For c e C, we can find w e {0, l} m , 
such that c = Gw. Thus, we can write 



P(w|s;7) = n^a(w V ( a ),s a ), 



(2) 



aeC 



where i(w vw ,s fl ) = e 7a if s a = YaeV(a) w * and ^K^.s,,) = e 7a otherwise. We will 
use the sum-product (Belief Propagation) algorithm [2] to calculate marginal probabilities 
for each information bit and find the assignment for each information bit in the form 

Wj = arg max P(wjls) = arg max V P(wls) = arg max V FT if) a (wv( a ), s )- (3) 

Here, the sum over all information variables without the i-th one is shortened as XL W •• 
To calculate ([3]) using the sum-product algorithm efficiently, we simplify the original 
update equations (see [2] for the original update rules). For our problem, all messages 
M^ a and M^ >i passed in £-th iteration in the original sum-product algorithm are two- 
component vectors. Using the original update equations, we define 
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To calculate the last ratio, we will use the definition of function if) a and partition the 
set w y(a) \ {i} = W even u W odd , where W even = {x e {0, l}!^)!- 1 ] # of 1 in x is even} and 
W odd = {x e {0, l^WI" 1 ! # of 1 in x is odd}. 

We now assume that s a = and later remove this assumption. In this special case, 
we have if) a (0, w V ( a )\{i}, s ) = e< a for all vectors from W even and i/j a (0, Wy^ij, s a ) = 
e" 7a for all vectors from W odd . Conversely, ip a (l, Wy^j, s tt ) = e" 7a for W et)en and 

4(l|Wv(a)\{i}|8a) = e7a for We can substitute and write R a % = %T£X^Z% 

where A = J] Wevm rW>\« <iaK), and B = ^w odd lW)\ W We can 

express both sums (A and B) in terms of the ratios Pl--L by dividing them by the constant 
factor rW)\ W <L(0) 
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Similarly, for B, B = |(C — D). Finally, we obtain 



where we used the substitution 



+ jeV(a)\{i} 1 + 

It is easy to see that the case where s a = 1 can be captured by the given substitution. 
Using the substitution B^ a = (l-i?S a )/(l+i?Sj, we can completely rewrite equations 
(BJ and (jSJ) solely in terms of Bf\ a and S a % t obtaining thus the final message-passing 
rules. From (J6j), 

1 + K^ a 1 + 1 [ beC{ma} K b ^ 1 + Ubecma} T^gi) 

_ n&eC(i)\{a} + 'Sfe-n \l ~ ribeC(0\{a} ~ ffL>i \l 
n&£C(j)\{a} [l + ^b->j ] + l~[beC(i)\{a} [l ~~ ^6->i ^] 

and from (J7J) 

jeV(o)\{i} 

where the source message £?if-> a is defined as P4f-> a = (— l) Sa tanh(7 a ). After max_iter 
iterations, we compute the final bias using the last satisfaction messages S^j^ 

= p(o) - p(i) = i-gjS = i - n^cw ggn = nb e c W [i + g&]-n be c W [i-^j] 

4 P(0) + P(l) 1 + f£ll 1 , n n [1 , q& 1 , TT [1 Offl 1 ' 

Thus, we just obtained equations (BiP-1), (BiP-2), (BiP-4), and (BiP-5) from Figure El 



3.2 Dealing with cycles in the factor graph 



The sum-product algorithm is exact (gives exact results) when the underlying graph is 
a tree. However, many researchers reported good results even for graphs with cycles. In 
principle, short cycles cause the messages to oscilate. The oscillations can be suppressed 
using a procedure called "damping" . A similar approach was introduced in the context 
of statistical mechanics by Pretti [B] . 
Using equation we can write 

In *<i = i„ n I < 8 > 

beC(i)\{a} beC(i)\{a} 

In other words, the update rule (BiP-2) is a sum of logarithmic terms. Thus, we can use 
the arithmetic mean in this representation to calculate the output message in the £-th 
iteration by averaging the input messages from iterations £ — 1 and £ — 2. This "low-pass 
temporal filter" will prevent large changes to output messages. Using (JS]), we can write the 

output ratio after applying the damping procedure as In RfX a = \ (^bec(i)\{a} m ^bZi^ + 
Tibec(i)\{a } ln R bZ? ) > and hence 

n r ^ ] - n ( 9 ) 

beC{i)\{a] beC(i)\{a} 

Using B\\ a = (1 — -R-2. a )/(l + RfXa) an d equations (jHD and (jEJ), we obtain equation 
(BiP-3) that is used for damping in the BiP algorithm. To obtain the message BfX a in 
practice, we calculate the temporary bias message -B«^ a using equation (BiP-2). Finaly, 
BfXa i s obtained from messages B^ a and Bf~^/ using equation (BiP-3). 



4 Convergence of the BiP algorithm 

While deriving the BiP algorithm, we tacitly assumed that for some information bits 
the magnitude of the bias at the end of each round is high, \Bi\ % 1. Such bits have 
a high tendency to be fixed to some value without generating too much distortion. To 
obtain a small final distortion, this condition should be fulfiled in each round for some 
bits. From practical experiments, we know that there exist good degree distributions 
satisfying this condition, while other distributions, such as regular distributions, do not 
build up sufficient bias magnitudes in each round and thus produce very high distortion. 
This observation was also made in [8]. 

We say that the BiP algorithm converges in r-th round if there exists at least one 
information bit i e V that fulfills \Bi\ > t for some constant threshold t. Moreover, 
we say that the BiP algorithm converges if it converges in all its rounds. A part of 
our future research is to find a condition for the class of degree distributions for which 
the BiP algorithm converges. This condition could be used for construction of degree 
distributions optimized for the BiP algorithm in a manner similar to density evolution 
in analysis of LDPC codes for channel coding [7]. 

Murayama [5] developed an approach for lossy source coding based on a modified 
Belief Propagation algorithm and provided results for regular LDGM codes with a fixed 
check degree 2. This algorithm was based on a standard approach used in channel coding: 



Rate: 0.37 

p(x) = 0.2710a; + 0.2258a; 2 + 0.1890x 5 + 0.0614a; 6 + 0.2528x 13 
\{x) = 0.9522x 9 + 0.0478a; 10 

Rate: 0.5 

p(x) = 0.1787a; + 0.1762x 2 + 0.1028x 5 + 0.1147a; 6 + 0.0122a; 12 + 0.0479a; 13 + 0.1159x 14 + 

+0.2516x 39 
X(x) = 0.9988x 9 + 0.0012x 10 

Rate: 0.65 

p(x) = 0.2454x + 0.1921a; 2 + 0.1357x 5 + 0.0838x 6 + 0.1116a; 12 + 0.0029a; 14 + 0.0222x 15 + 

+0.0742a; 28 + 0.1321a; 32 
\(x) = 0.4987x 5 + 0.5013a; 6 

Rate: 0.75 

p(x) = 0.2912a; + 0.1892a; 2 + 0.0408x 4 + 0.0873a; 5 + 0.0074a; 6 + 0.1126a; 7 + 0.0926x 15 + 

+0.0187x 20 + 0.1241a; 32 + 0.0361a; 39 
X(x) = 0.8016x 4 + 0.1984a; 5 

Figure 3: List of good degree distributions used for generating the results. 

run the BP algorithm over the factor graph of an LDGM code and threshold all bits based 
on the final LLRs. However, this approach cannot be used for a general LDGM code. The 
key point here is the degree of all check nodes. This idea is strongly connected with the 
BiP algorithm. Due to sufficient number of check nodes with degree 2, the BiP algorithm 
converges in its first round. The same observation can be made in other rounds. 

We can think of the BiP algorithm as a generalized approach described by Murayama. 
In terms of the BiP algorithm, he only uses one round and decimates all information bits 
at once. The BiP algorithm utilizes more rounds via decimation and thus it can use more 
general degree distributions. By using irregular degree distributions, we can achieve near- 
optimal rate-distortion performance. Good LDGM codes can be obtained from degree 
distributions optimized for channel coding, especially for the BSC channel. This choice, 
which was also suggested in [Sj, was motivated by the work of Martinian et al. pi] who 
pointed out a tight connection between capacity achieving codes for the BEC channel 
and their dual codes used for binary quantization in the erasure case. 

5 Results 

We implemented the BiP algorithm in C++, where the messages were represented using 
single precision numbers. The update equations were manually optimized using Intel's 
SSE1 extension so that the cache memory was used in an optimal way. All results 
presented in this work were obtained using an Intel Core2 X6800 2.93GHz CPU machine 
with 2GB RAM. We ran the machine on Linux in 64 bit mode. All C++ code was 
optimized to 64 bit mode and compiled using Intel C++ 9.0 compiler. The speed of this 
algorithm (throughput) was measured while both CPU cores were utilized. We ran the 
same algorithm on both cores, resulting in a 1.7-1.8 times higher throughput. 

In Figure [3], we present degree distributions from edge perspective that were used 
for generating all results. Some distributions were obtained from the LdpcOpt site 
(htt p : //lthcwww. epf 1 . ch/research/ldpcopt/p and optimized for the BSC channel. 

To compare our results with the work of Wainwright et al. [8j , we implemented their 
algorithm while using similar optimization approaches. In the case of an irregular code 
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Figure 4: (a) rate-distortion performance plot for selected irregular codes of length n = 
10 4 . (b) distortion and throughput plots for various code lengths for irregular code with 
rate R = 0.5. In all graphs, each point represents an average over 100 trials. 



with rate R = 0.5 and length n = 5000, we achieved throughput 1013 bits/sec. The 
BiP algorithm produced the same distortion with throughput 10415 bits/sec. The BiP 
algorithm is a special case of their algorithm for w sou = Wi n f Q = 0. 

Figure H] contains simulation results for various rates and various code lengths for 
a uniform profile g. All results were generated using the following parameter values: 
t = 0.8, max_iter = 25, start_damp = 10, num_max = 0.01 x m, num_min = 0.001 x m. 
We set 7 = (7, ... , 7), where 7 = 0.8, 7 = 1.07, 7 = 1.3, 7 = 1.5 for rates R = 0.37 to 
R = 0.75, respectively. 

We now present the results for a linear profile of weights (which is important for 
applications in steganography pQ), and perform weighted binary quantization using the 
BiP algorithm. For a source sequence s of length n, we define the linear profile g = 
(qi, . . . , g n ) as gi = 2(n — i)/n. From [1] (Appendix A), the rate-distortion bound in this 
weighted case will be achieved if each bit is nipped with probability p (l) = e~^ ea /{\ + 
e~^ Ba ) for each source bit a e C. 

To find optimal values of 7 ffl , we use the following iterative approach. Start with 
7a = 7, where 7 is taken from the uniform weight case. Find an estimate of p a (l) 
(denote it p Q (l)) by quantizing k random source sequences. We calculate the estimate 
as an arithmetic average and finally use a moving average filter to smooth the resulting 
sequence. Comparing the estimate p a (l) with p a (l), we finally set 7* = 7° + c(p a (l) — 
Po(l) — (p — p)), where p and p is the arithmetic average of p a (l) and p a (l), respectively, 
and c is a constant (in practice, we use c = 3). 

Usually, 10 iterations were sufficient to obtain good results. In Figure [5j we present 
the resulting p a (l) and j a for rate R = 0.5. A cubic polynomial was fit through the data 
points (it uses a centralized variable x). In Figure [6] (b), we use this polynomial and show 
how the BiP depends on the code length. We can see that the throughput is roughly the 
same as for the non- weighted BiP. In Figure [6] (a), we present the overall comparison of 
weighted vs. non-weighted BiP algorithm for all degree distributions. Here, we use the 
ordinary BiP algorithm (7 = (7, . . . ,7)) and measure the distortion using the weighted 
norm (linear case). Although the degree distributions were taken from Figure [3] and were 
thus not optimized for the weighted case, the weighted BiP algorithm still achieves very 
good results. 
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Figure 5: Flipping probabilities for linear profile g, R = 0.5, £ = 4.544. Values of 7 a 
were obtained using an iterative approach for n = 10000 and interpolated by a cubic 
polynomial. 
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Figure 6: (a) overall comparison of weighted vs. non-weighted BiP algorithm for a linear 
profile g, n = 10000. (b) results from using weighted BiP algorithm for linear profile g 
using different code lengths, R = 0.5. In all graphs, each point represents an average 
over 100 trials. 
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6 Conclusions and future work 



We proposed a new algorithm for binary quantization based on the Belief Propagation 
algorithm with decimation over factor graphs of LDGM codes. We call the algorithm 
Bias Propagation (BiP). Using the Bias Propagation algorithm, we significantly reduced 
the complexity of binary quantization using LDGM codes in comparison with |8J. We 
believe this reduction is new and constitutes an important contribution as it allows us to 
theoretically study the algorithm in the future. We postpone this problem to our future 
work. We showed that the algorithm based on pure Belief Propagation has the same 
rate-distortion performance as the algorithm based on Survey Propagation [8]. 
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